perm filename READB[B2,JMC]1 blob
sn#760769 filedate 1984-07-12 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00006 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \section{The basic functions and predicates on S-expressions.}
C00024 00003 \section{A more abstract view of S-expressions.}
C00032 00004 \section{Conditional terms.}
C00050 00005 \section{Recursive function definitions.}
C00075 00006 \subsubsection*{Exercises}
C00078 ENDMK
C⊗;
\section{The basic functions and predicates on S-expressions.}
\label{basicfns}
There are three basic functions on S-expressions:
{\it cons} for constructing an S-expression from two given S-expressions,
and {\it car} and {\it cdr} for selecting the first and second components
of a non-atomic S-expression.
There are two predicates, {\it atom} for testing whether or not an S-expression
is atomic and {\it eq} for testing equality of atoms.
In this section we will describe the basic \lisp\ programs
that compute these functions and predicates from list structure
representations of the arguments. All \lisp\ programs for computing
functions and predicates on S-expressions are built
from these basic functions as will be explained in the sections following this.
This is analogous to numeric programs which are based on
the arithmetic functions of addition, subtraction, multiplication,
and division, and the arithmetic predicates of equality and comparison.
First we introduce the notation to be used for writing \lisp
terms. This notation will be extended in later sections to express
\lisp\ programs as well. A \lisp\ term is an expression intended
to denote an S-expression (the value of the term).
We will use two languages for writing \lisp\ terms and programs -
{\it internal language} and {\it external language}.
Internal language represents \lisp\ terms and programs as S-expressions.
Internal language programs are somewhat harder for a person
to read and write, but it is easier for a person to write a program
to manipulate {\it object programs} when the object programs are
in internal language (i.e. represented as S-expressions).
External language is a notation that is compact and easier
for people to read and write. However, external language programs
are not S-expressions, and therefore it is not easy to write \lisp\
programs to generate, interpret, compile or otherwise manipulate
programs written in external language. (An additional advantage of
the external notation is that it is very similar to the language used
to express facts about the abstract domain of S-expressions given
in \chapref{PROVIN}.
[Remark: most \lisp\ implementations accept only internal
language programs.]
The simplest \lisp\ terms are variables and constants.
In the external language the notation for S-expression constants
is that described in the preceeding sections. (S-expression constants
will always appear in the special font used in the previous sections.)
In the internal language there is a possible ambiguity as to whether
an S-expression is to represent itself, or the value of the \lisp\ term
that it represents (if any). \lisp\ takes the latter point of view.
An S-expression constant is represented by a list whose first element
is the atom {\sx QUOTE} and whose second element is the S-expression.
Thus the S-expression constant {\sx (A~B)} is represented by the S-expression
{\sx (QUOTE~(A~B))}.
Variables are written in external notation as lower case italic identifiers.
Frequently just single letters such as $x$ and $u$ are used, but sometimes we
use names suggestive of the kind of object the variable represents.
The corresponding internal form will be the atom whose name is the same
identifier but in upper case (and appearing in the S-expression font).
Thus {\sx X} corresponds to $x$ and {\sx U} to $u$.
We often use $x$, $y$, $z$ to range over arbitrary S-expressions
and $u$, $v$, $w$ to range over lists.
Additional terms (called application terms) can be formed denoting
the result of one of the basic \lisp\ programs. These are expressed in external
language using ordinary mathematical notation for the application of functions
and predicates to a suitable number of arguments. Program names are used
for function symbols and variables or S-expression constants as arguments.
Argument lists will be surrounded by ``[\ ]'' thus reserving parentheses for
S-expressions. The brackets will often be omitted for single arguments.
The internal notation for application terms is more uniform. Namely
such an application term is represented by a list, where the first element
is the atom corresponding to the program name and the remaining elements
of the list represent the arguments.
\tabref{lsptrm} gives some examples of \lisp\ terms. Each
term is shown in external and internal notation along with the S-expression
value that it denotes. Note that no variables appear in these examples because
the value denoted by a \lisp\ term containing a variable depends on
the value assigned to the variable.
\begin{table}
\label{t1}
\halign{#\hfil&\quad$#$\hfil&\quad\sx # \hfil&\quad \sx #\hfil\cr
& {\rm External\ form} &{\rm Internal form}&{\rm Value denoted}\cr
\noalign{\vskip-\smallskipamount}
& \hrulefill & \hrulefill & \hrulefill \cr
\noalign{\medskip}
(i) &|(A . B)|& (QUOTE (A . B)) & (A . B) \cr
&|(CAR X)| & (QUOTE (CAR X)) & (CAR X) \cr
\noalign{\medskip}
(iia) & \qa |(A . B)| & (CAR (QUOTE (A . B))) & A\cr
(iid) & \qd |(A . B)| & (CDR (QUOTE (A . B))) & B\cr
(iic) & |A| \qcons |B| & (CONS (QUOTE A) (QUOTE B)) & (A . B)\cr
\noalign{\medskip}
(iiia) & \qa |((A . B) . A)| & (CAR (QUOTE ((A . B) . A))) &(A . B)\cr
(iiid) & \qd |((A . B) . A)| & (CDR (QUOTE ((A . B) . A))) & A\cr
(iiic) & |(A . B)| \qcons |A| & (CONS (QUOTE (A . B)) (QUOTE A)) & ((A . B) . A)\cr
\noalign{\medskip}
(iva) & \qa |(A B C D)| & (CAR (QUOTE (A B C D))) & A\cr
(ivd) & \qd |(A B C D)| & (CDR (QUOTE (A B C D))) & (B C D)\cr
(ivc) & |A| \qcons |(B C D)| & (CONS (QUOTE A) (QUOTE (B C D))) & (A B C D)\cr
\noalign{\medskip}
(v) & \qat |A| & (ATOM (QUOTE A)) & T\cr
& \qat |(A . B)| & (ATOM (QUOTE (A . B))) & NIL\cr
\noalign{\medskip}
(vii) & |A| \qeq |A| & (EQ (QUOTE A) (QUOTE A)) & T\cr
& |A| \qeq |B| & (EQ (QUOTE A) (QUOTE B)) & NIL\cr
& |(A . B)| \qeq |(A . B)| & (EQ (QUOTE (A . B)) (QUOTE (A . B))) & ?\cr
}
\vbox{\strut}
\hrule
\caption{Examples of \lisp\ terms in external and internal notation
and their values.}
\end{table}
We now describe the basic \lisp\ programs and give their external and
internal names. The external names are generally abbreviations for the
internal names and will appear in bold face type.
We begin with the programs for the selector functions.
$car$ is represented externally by \qa and internally by
{\sx CAR}. When applied to a non-atomic list structure the result is the a-part.
See \tabref{lsptrm} (iia) and (iiia) for examples.
$cdr$ is represented externally by \qd and internally by
{\sx CDR}. When applied to a non-atomic list structure the result is the d-part.
See \tabref{lsptrm} (iid) and (iiid) for examples.
The action of the selector operations on atoms is not defined in basic \lisp.
However, most implementations assign a some ``meaning'' to such terms, possibly
an error message.
Since lists are a particular kind of S-expression, the action of
the selector operations on non-empty lists is also determined by the above
definitions. In particular \qa selects the first element of the list and
\qd selects the rest of the list (e.g. the list formed by removing that
first element).
See \tabref{lsptrm} (iva) and (ivd).
Recall that the a-part of a non-empty list can be any S-expression, while the
d-part is always a list. Thus the selectors behave symmetrically on
S-expressions but unsymmetrically on lists reflecting the symmetry properties
of the list structure representation of S-expressions and lists.
[Historical note:
the names ``CAR'' and ``CDR'' stood for
``contents of the address part of register''
and ``contents of the decrement part of register'' in a 1957 precursor of \lisp
projected for the IBM 704 computer. The names have persisted for
lack of a clearly preferable alternative.]
{\it cons} is represented externally by an infix operator
\qcons\ and internally by {\sx CONS}. The program for {\it cons} must
have access to a supply of cons-cells called the free storage list.
Given pointers to two existing list structures, the program
removes a cons-cell from the free storage list and puts the first pointer
(address) into the a-part and the second into the d-part of this cell.
The result is the S-expression represented by the pointer to the new cons-cell.
See \tabref{lsptrm} (iic) and (iiic).
We see that for lists, {\it cons} is a prefixing operation.
Namely, if the second argument is a list then the result of applying {\it cons}
is the list obtained by putting the first argument onto
the front of that list.
See \tabref{lsptrm} (ivc).
[Note that giving the same pair of pointers to the {\it cons} program a
second time will result in the construction of
a list structure distinct from the first, although both represent the
same S-expression.]
The predicate {\it atom} is represented externally by \qat\ and
internally by {\sx ATOM}. The program for {\it atom} when given a pointer to
a list structure determines whether that list structure represents
and atom or not. The result is \qT\ if the list structure
is atomic and \qNIL\ otherwise.
See \tabref{lsptrm} (v).
The predicate {\it eq} is represented externally by the infix \qeq\
and internally by {\sx EQ}. As mentioned earlier, it is only required to
test for equality of atoms. The program for {\it eq} actually does a bit more.
Namely, given pointers to two list structures it compares these addresses.
If they are equal the result is \qT, otherwise it is \qNIL.
Since each atom is represented by a unique list structure,
the comparison is indeed a test
of equality if at least one of the arguments is atomic. More generally if
two list structures have the same address, they represent the same
S-expression. The converse is false because there is no guarantee
that the same S-expression is not represented by two different list structures.
(Two applications of {\it cons} to a pair of list structures in most
LISPs will result in two list structures representing the same S-expression,
but with different addresses.)
Equality of S-expressions is tested by a program based on {\it equal} which we will
describe in a later section. If this program were used instead of {\it eq} as
a basic function, \lisp\ would be logically simpler, but many programs
would be slower.
This concludes the description of the basic \lisp\ programs. To conclude
the section we give some further notational extensions, conventions and
abbreviations.
Besides the basic functions of \lisp, there will be user-defined
functions. We haven't given the mechanism for function definition yet, but
suppose a function $subst$ taking three arguments has been defined.
It may be used in terms like $ {\it subst}[x,y,z]$ having internal form
{\sx (SUBST X Y Z)}.
As in other programming languages and in mathematics generally,
application terms can have arbitrary terms as arguments, not just
variables and constants. Thus we have terms like
${\it subst}[x,y,\qa z] \qcons {\it subst}[x,y,\qd z]]$ involving
a user defined function {\it subst}. Its internal form is
{\sx (CONS (SUBST X Y (CAR Z)) (SUBST X Y (CDR Z)))}.
$\qn x$ is an abbreviation for $x \qeq \qNIL$. It rates a special
notation because \qNIL\ plays the same role among lists that zero
plays among numbers, and list variables often have to be tested to
see if their value is \qNIL.
Its internal form is {\sx (NULL X)}.
The notation ${\it list}[x1, x2,\ \dots\ ,xn]$ is used to denote the
composition of {\it cons}'s that forms a list from its elements. Thus
${\it list}[x, y, z]$ denotes ${\it cons}[x, {\it cons}[y, {\it cons}[z, \qNIL]]]$
and forms a list of three elements out of the quantities $x$, $y$, and
$z$.
We often write $\qlist{x, y,\ \ldots\ z}$ instead of ${\it list}[x, y,\ \ldots\ z]$
when this will not cause confusion. The experienced
implementer of programming languages will expect that since {\it list} has a
variable number of arguments, its implementation will pose problems. That
is indeed true. The internal form of
$\qlist{x, y,\ \ldots\ z}$ is {\sx (LIST X Y . . . Z)}.
Compositions of \qa\ and \qd\ are written by concatenating the
letters \qa\ and \qd. Thus, it is easily seen that $\qad u$ denotes
the second element of the list $u$, and $\qadd u$ denotes the third
element of that list. The internal names of these functions are
{\sx CADR} and {\sx CADDR} respectively.
In external language, we often omit brackets for functions
of one argument. Thus $\mkop{alt}~x$ stands for $\mkop{alt}[x]$,
and $\mkop{alt}\mkop{alt}x$
stands for $\mkop{alt}[\mkop{alt}[x]]$ . This convention was also used in the
descriptions of \qa, \qd, \qat, and \qn.
\subsubsection*{Exercise}
\begin{list}{\arabic{list}.}{\leftmargin0pt \labelwidth -\parindent}
\item What is the value of
$\qlist{{\sx A} \qcons \qa {\sx (B . C)}, {\sx D} \qcons \qd {\sx (B . C)}}$?
\item What combinations of \qa\ and \qd\ select {\sx (X Y)} and {\sx (CONS X Y)}
respectively from
$$\hbox{\sx ((LAMBDA (X Y) (CONS X Y)) (QUOTE A) (QUOTE B))}$$
\end{list}
\section{A more abstract view of S-expressions.}
\label{asexp}
In the previous sections, we have explained S-expressions
and the related basic functions and predicates in terms of the
sequences of symbols that represent them on paper or on the
screeen and in terms of the list structures that are used to
represent them in the memory of a computer. Still other representations
are possible, but the basic properties of S-expressions are
independent of the representation.
From the abstract point of view, there is a set {\it A}
of atoms. It isn't necessary to say what the atoms are, because
to pure \lisp\ function programs, we only need provide predicates
for telling whether an S-expression is an atom and for telling
when two occurrences of atoms are occurrences of the same atom.
The set of S-expressions is then defined as the least
set that contains the atoms and whenever it contains two elements
{\it x} and {\it y} also contains the ordered pair $<x,y>$.
We have here used the conventional mathematical notation for
ordered pairs to emphasize the abstractness. The \lisp\ notation
$(x.y)$ is entirely equivalent.
This definition can be expressed by the set equation
%\zz(1)
$$S = A + S \times S$$
%
where the symbol $\times$ represents, as is conventional in mathematics,
the operation of taking the set of ordered pairs of elements of
two sets, and the operation $+$ represents the {\it disjoint union}
of two sets. The word {\it disjoint} in {\it disjoint union}
means that the two sets are forced to have no elements in
common, by adding different labels to their elements. In this
case, we are merely making sure that no atom is simultaneously
a pair.
The basic functions and predicates of \lisp\ now come up
entirely naturally. We can tell whether an element is an
atom and the predicate for this is \qat. We can tell whether
two atoms are equal, and we will later show how the predicate
for telling whether two S-expressions are equal is built up
from this. If we have a pair, we can take its two components
with \qa\ and \qd. Finally, if we have two S-expressions, we
can form the pair with {\it cons}, also represented by the
infix \qcons.
Pure \lisp\ functions and predicates are defined in terms
of these abstract S-expressions, and this makes it easier to
prove things about them. However, practical \lisp\ systems, e.g.
\clisp, also distinguish many kinds of atoms (e.g. symbols,
numbers of various kinds, strings and arrays) and have
predicates that distinguish the kinds and function that can
be used to compute with the different kinds (e.g. the numerical
operations).
The material in the rest of this section is not used
further in the book, so skip it if you find it
uninteresting or too obscure.
The equal sign in equation \zz(1) represents a 1-1
correspondence between the sets on the two sides. The symbols
$+$ and $\times$ suggest the distributive laws
%
$$A\times (B + C) = A\times B + A\times C$$
%
and
%
$$(B + C)\times A = B\times A + C\times A,$$
%
and these are correct in the sense of 1-1 correspondence. We
can substitute $A + S\times S$ for {\it S} in \zz(1) and get
%
\zz(2) $$S = A + (A + S\times S)\times (A + S\times S)$$
%
and expand by the distributive laws to get
%
\zz(3) $$S = A + A\times A + A\times S\times S + S\times S\times A,$$
%
wherein an appropriate associative law justifies ignoring the grouping.
Expanding this further and collecting terms with numerical co-efficients
gives
%
\zz(4) $$S = A + A↑2 + 2A↑3 + 5A↑4 + 14A↑5 + etc.$$
The interpretation the co-efficients of this equation is that there
are, for example, two ways of making an S-expression out of 3 atoms
in a given order, namely $((x.y).z)$ and $(x.(y.z))$, and 5 ways of
making an S-expression out of 4 atoms, namely $(((w.x).y).z)$, $((w.(x.y)).z)$,
$((w.x).(y.z))$, $(w.((x.y).z))$ and $(w.(x.(y.z)))$, and 14 ways of
making an S-expression out of 5 atoms, etc.
The general formula for how many ways there are of making
an S-expression out of {\it n} atoms may be obtained as follows.
Regard \zz(1) as a quadratic equation for {\it S}, i.e.
%
$$S↑2 - S + A = 0,$$
%
and solve it by the quadratic formula, getting
%
$$S = 1/2(1 \pm \sqrt{(1 - 4A)}).$$
%
Then expand the square root by the
binomial theorem getting an infinite series. Choose the $-$ sign for
the root so that the co-efficients come out positive. A calculation
calculation shows that the {\it n}th co-efficient is given by
%
$$C↓n = {(2n-2)! \over {n!(n-1)!}}.$$
%
These numbers are called Catalan numbers.
\noindent{\bf Mathematical exercise}: Carry out the computation of the
Catalan number formula, and
explain why this strange procedure of applying the quadratic formula to an
equation involving sets gives the right answer.
\section{Conditional terms.}
\label{cond}
In \lisp\ defining functions, predicates and terms by cases is done by
using conditional terms.In external language a simple
conditional term has the form
%
$$\qif e_{test} \qthen e_{then} \qelse e_{else}$$
%
where $e_{test}$ , $e_{then}$, $e_{else}$ can be any terms.
The conditional term $\qif e_{test} \qthen e_{then} \qelse e_{else}$
is evaluated as follows: first $e_{test}$ is evaluated and determined to be
\qtrue\ or \qfalse. If $e_{test}$ is \qtrue, then $e_{then}$ is evaluated and its
value is the value of the expression. If $e_{test}$ is \qfalse, then $e_{else}$ is
evaluated and its value is the value of the expression. Note that if
$e_{test}$ is \qtrue, $e_{else}$ is never evaluated, and if $e_{test}$ is \qfalse, then
$e_{then}$ is never evaluated. The importance of this will be explained later.
In all significant implementations of \lisp\ the meanings of \qtrue\ and
\qfalse\ are interpreted as follows. If the term $e_{test}$ evaluates to
the atom |NIL| then the term is \qfalse, if it evaluates to any other \lisp\
object then it is interpreted as being \qtrue.
There is a symbol |T| that is often used as the generic \qtrue. However many \lisp\
predicates do not always evaluate to either |NIL| or |T|. Their values can often
be more useful.
A familiar function that can be written conveniently using conditional
expressions is the absolute value of a real number. We have
$$\|x\| = \qif x<0 \qthen -x \qelse x.$$
[Remark: The common mathematical notation for such definitions is
$$\|x\| = \cases{x&if\ $x \ge 0$\cr-x&otherwise}$$
but the right hand side is not allowed to be part of another expression.]
More generally a conditional term has the form
$$\qif p_{1} \qthen e_{1} \qelse
\qif p_{2} \qthen e_{2} \ldots \qelse
\qif p_{n} \qthen e_{n} \qelse e_{n+1}$$
There can be any number of clauses. The value is determined by
evaluating the propositional terms $p_1$, $p_2$, etc. until a true term is
found, the value is then that of the term following the next \qthen.
If none of the propositional terms is true, then the value is that of
the term following the last \qelse.
\figref{f6} shows an example of a (numeric) function defined
using a conditional expression.
\begin{figure}
\label{f6}
$$??\hbox{Graph of triangle function}??$$
%
% (0,1)
% ≤'`≥
% ≤' `≥
% ↑ ≤' `≥
% ~ ≤' `≥
% tri[x] ~ αααααααααααααααα' `αααααααααααααααα
% ~ (-1,0) (1,0)
% ααα→
% x
% .SKIP 2
%
$$tri[x] = \qif x < -1 \qthen 0 \qelsif x<0 \qthen 1+x \qelsif x<1
\qthen 1-x \qelse 0.$$
\caption{The triangle function.}
\end{figure}
The conditional term in internal language has at least the following two
forms
%
$$\hbox{\sx (COND ($p_1$ $e_1$) ($p_2$ $e_2$) \dots ($p_n$ $e_n$))}$$
%
or
%
$$\hbox{\sx (IF $e_{test}$ $e_{then}$ $e_{else}$)}$$
%
In the |COND| form there can be any number of $p-e$ pairs. Its value is determined
by evaluating the
$p$'s successively until one is found with a value different from |NIL|.
The value of the
corresponding $e$ is then taken as the value of the conditional term.
If all the $p$'s are |NIL|, then the value of the conditional
term is undefined. (In some implementations the conditional term
is given a default value such as \qNIL\ if none of the $p$'s are true.)
The |IF| form is evaluated in the same fashion as the external conditional.
The choice of the particular form to be used in actual programs is a matter of
style and taste.
Conditional expressions may be compounded with functions to
get terms like
%
$$\qif \qn u \qthen v \qelse \qa u \qcons append[\qd u, v]$$
%
involving a yet to be defined function $append$. One of the internal
forms of this is
%
$$|(COND ((NULL U) V) (T (CONS (CAR U) (APPEND (CDR U) V))))|$$
%
another is
%
$$|(IF (NULL U) V (CONS (CAR U) (APPEND (CDR U) V)))|$$
%
One would normally have expected to write
{\sx (QUOTE T)}, but there is a special convention that
\qT\ may be written without {\sx QUOTE}. This convention applies to \qNIL\ also.
This means that these symbols cannot be used as variables.
So for example it is impossible to "fall off the end" of a conditional
expression with \qT as its last $e$. It is the translation
into internal form of the final \qelse\ of a conditional
expression in external form.
Using the conditional and the \lisp\ constants |NIL| and |T| we can
define
conjunction(\qand), disjunction(\qor) and negation(\qnot).
Both \qand\ and \qor\ are associative, so we can write expressions
like $p_1 \qand p_2 \qand p_3$ without worrying about the grouping.
The evaluation of propositional terms is defined using conditional
terms as follows:
%
$$\vbox{\ialign{\hfil $#$&$\>=#$\hfil\cr
p \qand q & \qif p \qthen q \qelse \qNIL\cr
p \qor q & \qif p \qthen p \qelse q\cr
\qnot p & \qif p \qthen \qNIL \qelse \qT\cr
}}$$
%
The internal forms of these definitions are
%
$$\vbox{\halign{\hfil \sx # &$=$\ \sx #\hfil\cr
(AND P Q) & (IF P Q NIL)\cr
(OR P Q) & (IF P P Q) \cr
(NOT P) & (IF P NIL T)\cr
}}$$
%
Note that if $p$ is false then $p \qand q = \qNIL$ independent of the value of
$q$, and likewise if $p$ is true, then $p \qor q = \qT$ is independent of
$q$. If $p$ has been evaluated and found to be false, then $q$ does
not have to be evaluated at all to find the value of $p \qand q$, and, in
fact, \lisp\ does not evaluate $q$ in this case. Similarly, $q$ is not
evaluated in evaluating $p \qor q$ if $p$ is true. This procedure is in
accordance with the above conditional expression descriptions of
these operators. The importance of this convention, which contrasts
with that of ALGOL 60, will be apparent later when we discuss
recursive definitions of functions and predicates.
We conclude this section by remarking that most
\lisp\ implementations allow \qand\ and \qor\ to have arbitrary numbers
of arguments.
\section{Recursive function definitions.}
\label{recdefn}
In such languages as FORTRAN and ALGOL, computer programs
are expressed in the main as sequences of assignment statements and
conditional {\bf go to}'s. In \lisp, programs are mainly expressed in the
form of recursively defined functions. We begin with an example.
We want a function \mkop{alt} such that $\mkop{alt} u$ gives a list whose
elements are
alternate elements of the list $u$ beginning with the first. For example
we want
$$\vbox{\halign{\hfil$#\;$&$=$ {\sx #}\hfil\cr
\mkop{alt} |(A B C D E)| & (A C E)\cr
\mkop{alt} |((A B) (C D))| & ((A B))\cr
\mkop{alt} |(A)| & (A)\cr
\mkop{alt} \qNIL & \qNIL\cr
}}$$
%
and in \lisp\ we can define \mkop{alt} by the recursion equation
$$\edefun{alt}{%
(DEFUN ALT (U)
(IF (OR (NULL U) (NULL (CDR U))) U (CONS (CAR U) (ALT (CDDR U)))) )
}$$
$$\mkop{alt} u \leftarrow \qif \qn u \qor \qn \qd u \qthen u
\qelse \qa u \qcons \mkop{alt} \qdd u$$
In case you need a review of our precedence conventions, fully bracketed
it looks like
%
$$\mkop{alt}[u] \leftarrow [\qif [\qn [u] \qor \qn [\qd [u]]] \qthen u \qelse [\qa [u] \qcons \mkop{alt}[\qdd [u]]]]$$
%
The terms appearing in the recursion equation are formed by the methods
previously discussed. Notice that the function \mkop{alt} occurs in
the right hand side of its own defining equation and
that we use ``$\leftarrow$'' instead of ``$=$''. The recursion equation
tells us that \mkop{alt} is a function of one variable $u$, and that the value of
$\mkop{alt} u$ for some particular list is computed by evaluating the right hand
side of the definition, substituting that list for $u$ whenever $u$ is
encountered and re-evaluating the right hand side with a new
$u$ whenever a term $\mkop{alt} e$ is encountered.
This process is best illustrated by evaluating some expressions.
Consider evaluating $\mkop{alt} \qNIL$.
We do it by evaluating the expression to the right of the ``$\leftarrow$''
in \defref{alt}
remembering that the variable $u$ has the value \qNIL. A conditional
expression is evaluated by first evaluating the first test
term - in this case $\qn u \qor \qn \qd u$.
This expression is a disjunction
so we first evaluate the first disjunct, namely $\qn u$.
Since $u = \qNIL$, $\qn u$ is true, the disjunction is true, and the value
of the conditional expression is the value of the term after the
first true test-term. This term is $u$, and its value is
\qNIL, so the value of $\mkop{alt}[\qNIL]$ is \qNIL. Obeying
the rules about the order of evaluation of terms in conditional
expressions is important in this case since if we didn't obey
these rules, we might find ourselves trying to evaluate $\qn \qd u$ or
$\mkop{alt}[\qdd u]$, and since $u$ is \qNIL, neither of these has a value.
(An attempt to evaluate them would result in an error return in
most LISPs.)
As a second example, consider $\mkop{alt} |(A B)|$. Since neither
$\qn u$ nor $\qn \qd u$ is true in this case, the disjunction $\qn u \qor \qn \qd u$
is false and the value of the expression is the value of
$\qa u \qcons \mkop{alt}\qdd u$.
$\qa u$ has the value {\sx A}, and $\qdd u$ has the value
\qNIL, so we must now evaluate $\mkop{alt} \qNIL$, and we already know that this
is \qNIL. Therefore, the value of
$\mkop{alt} |(A B)|$ is that of $|A| \qcons \qNIL$ which is {\sx (A)}.
This evaluation is described by the following sequence of equations:
%
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
\mkop{alt} |(A B)| &
\qif [\qn |(A B)| \qor \qn \qd |(A B)|] \qthen |(A B)|
\qelse \qa |(A B)| \qcons \mkop{alt}[\qdd |(A B)|]\cr
& \qif [\qfalse \qor \qn \qd |(A B)|] \qthen |(A B)|
\qelse \qa |(A B)| \qcons alt[\qdd |(A B)|]\cr
& \qif \qfalse \qthen |(A B)| \qelse \qa |(A B)| \qcons \mkop{alt}[\qdd |(A B)|]\cr
& \qa |(A B)|\qcons \mkop{alt}[\qdd |(A B)|]\cr
& |A| \qcons \mkop{alt}[\qNIL]\cr
& |A| \qcons \qif [\qn \qNIL \qor \qn \qd \qNIL] \qthen \qNIL
\qelse \qa \qNIL \qcons \mkop{alt}[\qdd \qNIL]\cr
& |A| \qcons \qif [\qtrue \qor \qn \qd \qNIL] \qthen \qNIL
\qelse \qa \qNIL \qcons \mkop{alt}[\qdd \qNIL]\cr
& |A| \qcons \qif \qtrue \qthen \qNIL \qelse \qa \qNIL \qcons \mkop{alt}[\qdd \qNIL]\cr
& |A| \qcons \qNIL\cr
& |(A)|\cr
}}$$
This is still very long-winded, and now that the reader
understands the order of evaluation of conditional
expressions, we can proceed more briefly to evaluate
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
{\it\mkop{alt}}[|(A B C D E)|] & |A| \qcons \mkop{alt}[|(C D E)|]\cr
& |A| \qcons [|C| \qcons \mkop{alt}[|(E)|]]\cr
& |A| \qcons [|(C E)|]\cr
& |(A C E)|\cr
}}$$
Here are three more examples of recursive functions and their
application. We define \mkop{last} by
$$\edefun{last}{%
(DEFUN LAST (U) (COND ((NULL (CDR U)) (CAR U)) (T (LAST (CDR U))) ))
}$$
$$\mkop{last} u \leftarrow \qif \qn \qd u \qthen \qa u \qelse \mkop{last} \qd u$$
and we compute
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
\mkop{last} |(A B C)| &
\qif \qn \qd |(A B C)| \qthen \qa |(A B C)| \qelse \mkop{last} \qd |(A B C)|\cr
& \mkop{last} |(B C)|\cr
& \mkop{last} |(C)|\cr
& |C|\cr
}}$$
%
Clearly \mkop{last} computes the last element of a list. $\mkop{last}\qNIL$ is
undefined in the LISP language; the result of trying to compute it
may be an error message or may be some random result depending on the
implementation. (A heavy duty version of \mkop{last} would explicitly
call a function called {\bf error} with a string expressing the complaint
that the program had tried to compute $\mkop{last} \qNIL$.)
The function \mkop{subst} is defined by
$$\edefun{subst}{%
(DEFUN SUBST (X Y Z)
(COND ((ATOM Z) (COND ((EQ Z Y) X) (T Z)))
(T (CONS (SUBST X Y (CAR Z)) (SUBST X Y (CDR Z)))))),
}$$
%
$$\mkop{subst}[x, y, z] \leftarrow
\qif \qat z \qthen [\qif z \qeq y \qthen x \qelse z]
\qelse \mkop{subst}[x,y,qa z] \qcons \mkop{subst}[x,y,\qd z]$$
%
We have
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
\mkop{subst}[|(A.B)|, |X|, |((X.A).X)|] &
\mkop{subst}[|(A.B)|, |X|, |(X.A)|] \qcons \mkop{subst}[|(A.B)|, |X|, |X|]\cr
& [\mkop{subst}[|(A.B)|, |X|, |X|]
\qcons \mkop{subst}[|(A.B)|, |X|, |A|]] \qcons |(A.B)|\cr
& [[|(A.B)|] \qcons |A|] \qcons |(A.B)|]\cr
& |(((A.B).A).(A.B))|\cr
}}$$
%
\mkop{subst} computes the result of substituting the S-expression $x$ for
the atom $y$ in the S-expression $z$. This operation is important
in all kinds of symbolic computation.
The function $\mkop{append}[u,v]$ which gives the concatenation of the
lists $u$ and $v$ is also important. It is also denoted by the
infixed expression $u \qapp v$. For example we have
%
$$|(A B C)| \qapp |(D E F)| = |(A B C D E F)|$$
$$\qNIL \qapp |(A B)| = |(A B)| = |(A B)| \qapp \qNIL$$
%
and the formal definition
$$\edefun{append}{%
(DEFUN APPEND (U V)
(COND ((NULL U) V) (T (CONS (CAR U) (APPEND (CDR U) V))) ))
}$$
$$u \qapp v \leftarrow \qif \qn u \qthen v \qelse \qa u \qcons [\qd u] \qapp v$$
The \mkop{append} function is machine coded in most Lisp systems
in a way that allows an arbitrary number of arguments, e.g.
$$|(APPEND (QUOTE (A B)) (QUOTE (C D)) (QUOTE (E F)))|\ {\rm is}\ |(A B C D E F)|$$
It is convenient to use an infix for \mkop{append} because it is associative, i.e.
$$u \qapp [v \qapp w] = [u \qapp v] \qapp w$$
so we can just write $u \qapp v \qapp w$.
Infix notation for a recursively defined function is especially
appropriate when the function is associative, but we often use it
when it is mathematically conventional or just convenient.
The propositional operations can also be used in making recursive
definitions since we take advantage of the order of evaluation of
constituents. Thus, we define a predicate \mkop{equal} by
%
$$\edefun{equal}{%
(DEFUN EQUAL (X Y)
(OR (EQ X Y)
(AND (NOT (ATOM X))
(NOT (ATOM Y))
(EQUAL (CAR X) (CAR Y))
(EQUAL (CDR X) (CDR Y)))))
}$$
$$\mkop{equal}[x,y] \leftarrow
x \qeq y \qor [\qnot \qat x \qand \qnot \qat y \qand
\mkop{equal}[\qa x,qa y] \qand \mkop{equal}[\qd x, \qd y]]$$
%
$\mkop{equal}[x,y]$ is true if and only if $x$ and $y$ are the same
S-expression, and the use of this predicate makes up for the fact
that the basic predicate \qeq\ is guaranteed to test equality only
when one of the operands is known to be an atom.
Membership of an S-expression $x$ in a list $u$ is tested by
$$\edefun{member}{%
(DEFUN MEMBER (X U) (IF (NULL U) NIL (IF (EQUAL X (CAR U)) U (MEMBER X (CDR U)))))
}$$
%!fcnmember& ⊗⊗⊗member[x, u] ← \qnot \qn u \qand [[x = \qa u] \qor member[x, \qd u]]⊗.
This relation is also denoted by $x \qmem u$. Notice that \mkop{member}[x,u] does
not just return |T| or |NIL|, it returns the tail of the list u
beginning with the leftmost occurance of x if
it apppears else it returns |NIL|.
Here are some computations:
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
\mkop{member}[|B|, |(A B)|] &
\qif \qn |(A B)| \qthen \qif [[|B| \qequal \qa |(A B)|] \qthen |(A B)|
\qelse \mkop{member}[|B|, \qd |(A B)|]]\cr
& \mkop{member}[|B|, |(B)|]\cr
& |(B)|\cr
}}$$
Sometimes a function is defined with the help of auxiliary
functions. Thus, the reverse of a list $u$,
(e.g. $\mkop{reverse}[|(A B C D)|] = |(D C B A)|$),
is given by the pair of equations
%
$$\edefun{reverse}{%
(DEFUN REVERSE (U) (REV U NIL))
(DEFUN REV (U V) (COND ((NULL U) V) (T (REV (CDR U) (CONS (CAR U) V))) ))
}$$
%
% reverse[u] ← rev[u, \qNIL]⊗
% !fcnreverse& ⊗⊗⊗rev[u, v] ← \qif \qn u \qthen v \qelse rev[\qd u, [\qa u].v]⊗.
A computation is:
$$\vbox{\halign{\hfil$#$&$\>= #$\hfil\cr
\mkop{reverse}[|(A B C)|] & \mkop{rev}[|(A B C)|, \qNIL]\cr
& \mkop{rev}[|(B C)|, |(A)|]\cr
& \mkop{rev}[|(C)|, |(B A)|]\cr
& \mkop{rev}[\qNIL, |(C B A)|]\cr
& |(C B A)|\cr
}}$$
A more elaborate example of recursive definition is given by
$$\edefun{flatten}{%
(DEFUN FLATTEN (X) (FLAT X NIL))
(DEFUN FLAT (X U) (COND ((ATOM X) (CONS X U)) (T (FLAT (CAR X) (FLAT (CDR X) U))) ))
}$$
%
% ⊗⊗⊗flatten[x] ← flat[x, \qNIL] ⊗
% !fcnflatten&
% ⊗⊗⊗flat[x, u] ← \qif \qat x \qthen x.u \qelse flat[\qa x, flat[\qd x, u]]⊗.
%
We have
$$\vbox{\halign{\hfil$ #$&$\>= #$\hfil\cr
\mkop{flatten}[|((A.B).C)|] & \mkop{flat}[|((A.B).C)|, \qNIL]\cr
& \mkop{flat}[|(A.B)|, \mkop{flat}[|C|, \qNIL]]\cr
& \mkop{flat}[|(A.B)|, |(C)|]\cr
& \mkop{flat}[|A|, \mkop{flat}[|B|, |(C)|]]\cr
& \mkop{flat}[|A|, |(B C)|]\cr
& |(A B C)|\cr
}}$$
The reader will see that the value of $\mkop{flatten}[x]$ is a list of the
atoms of the S-expression $x$ from left to right. Thus
$\mkop{flatten}[|((A B) A)|] = |(A B NIL A NIL)|$.
Many functions can be conveniently written in more than one
way. For example, the function \mkop{reverse} mentioned above can be
defined without an auxiliary function as follows:
$$\edefun{reverse1}{%
(DEFUN REVERSE1 (U)
(COND ((NULL U) NIL) (T (APPEND (REVERSE1 (CDR U)) (LIST (CAR U)))) ))
}$$
$$\mkop{reverse1} u \leftarrow \qif \qn u \qthen \qNIL \qelse \mkop{reverse1}
\qd u \qapp \qlist{\qa u}$$
%
but the earlier definition involves less computation,
because \qapp\ takes time proportional to the length of its first argument.
Similarly the function computed by \mkop{flatten} can also be computed by the
simpler but less efficient program \mkop{fringe}
$$\edefun{fringe}{%
(DEFUN FRINGE (X)
(COND ((ATOM X) (LIST X)) (T (APPEND (FRINGE (CAR X)) (FRINGE (CDR X)))) ))
}$$
$$\mkop{fringe} x \leftarrow \qif \qat x \qthen \qlist{x}
\qelse \mkop{fringe} \qa x \qapp \mkop{fringe} \qd x$$
The use of conditional expressions for recursive function
definition is not limited to functions of S-expressions. For
example, the factorial function and the Euclidean algorithm for the
greatest common divisor on the non-negative integers may be written
as follows:
$$\edefun{factorial}{%
(DEFUN FACTORIAL (N)
(COND ((EQUAL N 0) 1) (T (* N (FACTORIAL (1- N)))) ))
}$$
$$n! \leftarrow \qif n \qeq 0 \qthen 1 \qelse n \times [n-1]!$$
and
$$\edefun{factorial}{%
(DEFUN GCD (M N)
(COND ((> M N) (GCD N M))
((EQUAL M 0) N)
(T (GCD (MOD N M) M)) ))
}$$
$$\mkop{gcd}[m, n] \leftarrow
\qif m \qgt n \qthen \mkop{gcd}[n, m] \qelse \qif m \qeq 0 \qthen n
\qelse \mkop{gcd}[n \mkop{mod} m, m]$$
%
where $n \mkop{mod} m$ denotes the remainder when $n$ is divided by $m$ and
may itself be expressed recursively by
$$\edefun{mod}{%
(DEFUN MOD (N M)
(COND ((< N M) N) (T (MOD (- N M) M)) ))
}$$
$$n \mkop{mod} m \leftarrow \qif n \qlt m \qthen n \qelse [n-m] \mkop{mod} m$$
%
(Note that these definitions must be modified if negative integers
are to be included in the domain.)
The internal form of function definitions depends on the implementation.
\clisp\ uses the form
%
$${\sx (DEFUN}\ \langle \hbox{\rm function--name} \rangle\quad
\langle \hbox{\rm list--of--variables} \rangle\quad
\langle \hbox{\rm right--hand--side} \rangle{\sx )}.$$
%
Thus in \clisp\ the definition of \mkop{subst} is
%
\begintt
(DEFUN SUBST (X Y Z)
(COND ((ATOM Z) (COND ((EQ Z Y) X) (T Z)))
(T (CONS (SUBST X Y (CAR Z)) (SUBST X Y (CDR Z))))))
\endtt
and the definition of \mkop{alt} is
\begintt
(DEFUN ALT (U)
(IF (OR (NULL U) (NULL (CDR U))) U) (CONS (CAR U) (ALT (CDDR U)))).
\endtt
\subsubsection*{Exercises}
\begin{list}{\arabic{list}.}{\leftmargin0pt \labelwidth -\parindent}
\item Consider the function \mkop{drop} defined by
$$\mkop{drop}[u] \leftarrow \qif \qn u \qthen \qNIL \qelse \qlist{\qa u}
\qcons \mkop{drop}[\qd u].$$
Compute (by hand) $\mkop{drop}[|(A B C)|]$. What does \mkop{drop} do to
lists in general? Write \mkop{drop} in internal notation using {\sx DEFUN}.
\item What does the function
%
$$\mkop{r2}[u] \leftarrow
\qif \qn u \qthen \qNIL \qelse \mkop{reverse}[\qa u] \qcons \mkop{r2}[\qd u]$$
%
do to lists of lists? How about
%
$$\mkop{r3}[x] \leftarrow
\qif \qat x \qthen x \qelse \mkop{reverse}[\mkop{r4}[x]]$$
$$\mkop{r4}[u] \leftarrow
\qif \qn u \qthen \qNIL \qelse \mkop{r3}[\qa u] \qcons \mkop{r4}[\qd u]?$$
\item Compare
%
$$\mkop{r3'}[x] \leftarrow
\qif \qat x \qthen x
\qelse \qlist{\mkop{r3'}[\qd x]} \qapp \qlist{\mkop{r3'}[\qa x]}$$
%
with the function \mkop{r3} of the preceding example.
\item Consider \mkop{r5} defined by
%
$$\mkop{r5}[u] \leftarrow \qif \qn u \qor \qn \qd u \qthen u
\qelse [\qa \mkop{r5}[\qd u]] \qcons \mkop{r5}[\qa u \qcons \mkop{r5}[\qd \mkop{r5}[\qd u]]]$$
%
Compute $\mkop{r5}[|(A B C D)|]$. What does \mkop{r5} do?
It isn't a good way of computing this function even though it
involves no auxiliary functions. [This ingenious program was discovered
by S. Ness].
\end{list}
% "answers"
% ⊗r2 returns the list with each of its elements reversed.
% ⊗r3, ⊗\mkop{r4} maps a general list structure into one with 1st 3rd ... oddth level
% lists reversed, even level lists in original order.
% ⊗r3' returns the mirror image of an S-expression
% ⊗\mkop{r5} reverses a list.